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Abstract 

Information hiding for covert communication is rapidly 
gaining momentum. With sophisticated techniques being 
developed in steganography, steganalysis needs to be 
universal. In this paper we propose Universal Steganalysis 
using Histogram, Discrete Fourier Transform and SVM 
(SHDFT). The stego image has irregular statistical 
characteristics as compare to cover image. Using Histogram 
and DFT, the statistical features are generated to train One- 
Class SVM to discriminate the cover and stego image. 
SHDFT algorithm is found to be efficient and fast since the 
number of statistical features is less compared to the 
existing algorithm. 
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I. Introduction 

Since long time ago, surreptitious communication 
and exchange of information is well known. There are 
numerous illustrations depicting covertness used in 
communication. Steganography, water marking, 
cryptography are some of the techniques used for hiding 
information [1]. There are several mediums for hiding 
information, such as digital images, text, audio, web 
pages, video, etc. Steganography is a skill and discipline 
of embedding a secret payload into a cover medium. The 
redundant bits in the cover medium are identified and 
replaced with the payload. The intent of steganography is 
ensuring that the presence of hidden message is 
undetectable. The different techniques used for 
embedding of data are Least Significant Bit (LSB), 
Discrete Cosine Transform (DCT), Discrete Wavelet 
Transform (DWT), Spread Spectrum (SS) and Palette 
technique. There are software tools available over the 
Internet for embedding the payload in digital images and 
videos viz., S-Tools, J-Steg, Outguess, F5 and Steghide. 

The steganography is being used by criminals and 
terrorists to exchange information regarding their illegal 
activities. The government and other standard 
organizations are using steganalysis to restrain these 
illegal activities. Steganalysis is to discover and 
recognize the extent of a hidden message and can be 
either blind (universal) or non-blind (embedding 



specific). In case of universal technique, the embedding 
scheme is unknown; so attempt is made to detect the 
existence of hidden data. Whereas in embedding specific, 
the embedding technique is known; therefore payload 
size is estimated. There are Steganalysis Tools available 
freely or as commercial software. The Stegdetect Tool, 
detects Jsteg, Jphide, invisible secrets, Outguess, F5, 
camouflage Steganographic schemes in JPEG images. 
The Tools developed using the Chi-Square analysis, 
performs a statistical attack to detect hidden data in BMP 
images. 

The success of steganography depends on various 
aspects such as the algorithm used for embedding, 
compression algorithm used and alteration of image 
properties. LSB is most commonly used technique for 
image steganography, which uses bitwise methods to 
manipulate LSB of the cover image. The minute changes 
of LSB are imperceptible to human eye and the method 
of hiding information in LSB can be analogized by 
adding noise to the image. Reliable algorithms for 
compression used in Steganography are Windows Bitmap 
(BMP), Graphics Interchange Format (GIF) and Joint 
Photographic Experts Group (JPEG), to ensure the 
hidden information is not lost after transformation. 
Lossless compression algorithms such as BMP and GIF 
formats are chosen for LSB techniques in which 
modifications are usually made in spatial domain. JPEG 
is losy compression algorithm where data hiding is 
usually done in frequency domain. 

General Steganalysis method has not been developed 
since every Steganographic method utilizes different 
methods of embedding the payload. In classical 
Steganographic schemes, the security lies in the 
concealment of the encoding technique whereas the 
modern schemes adopt Kerchoffs Principle of 
Cryptography, and hence the security depends on the 
secret key that is used to encode the payload. 

There are several Classifiers being used for pattern 
recognition: Bayesian Multi-Variate, Fischer Linear 
Discriminant (FLD), Neural Network (NN), and Support 
Vector Machines (SVM). SVM is based on statistical 
learning theory which is highly dependent upon the 
kernel functions viz., Gaussian Radial Basis Function, 
Polynomial, Exponential Radial Basis Function, Multi- 
Layer Perceptron, Splines, BSplines, Additive Kernels 
and Tensor Product are used for mapping features. SVM 
can be classified as One-Class and Multi-Class. In One- 
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Class SVM, only the features of cover images are 
required for classification of cover and stego image 
whereas in Multi-Class SVM, the features of both the 
cover and stego-images are required for discriminating 
the cover and stego image. Owing to simple geometry, 
SVMs are used more predominantly than Neural 
Networks. Biggest limitation of SVM is choosing of the 
kernel function, speed and size, both in training and 
testing. 

Contribution: In this paper, we have proposed 
SHDFT Universal Steganalysis Method. Histogram and 
Discrete Fourier Transform of color images are computed 
to obtain the PDF moments. In addition, the energy of the 
image is considered. The non-linear Classifier Support 
Vector Machine is used for classification of cover and 
stego image. 

II. Related work 



for images. The statistical regularities exhibited by the 
natural images are captured by First and Higher Order 
Statistical Model. The FLD is used for classification of 
the cover and stego image. This model explores its 
usefulness in digital forensics including steganography; 
distinguish between natural photographs and computer 
generated images. 

Siwei Lyu and Hany Farid [9] have proposed 
universal Steganalysis using Higher Order Statistics 
(SHS). Wavelet Decomposition based on QMF is used to 
obtain the statistics of an image. The magnitude statistics 
is obtained from Linear Predictor and phase statistics is 
obtained from Gaussian Pyramid and Local Angular 
Harmonic Decomposition. One Class and Multi Class 
SVM is used for classification. The disadvantage of the 
algorithm is the number of features required to train the 
SVM are high. Hence, the time required to detect stego 
image is more. 



Anderson and Petitcolas [2] have proposed two 
mathematical frame works for Steganography: 
informatics and theoretical models. The quality of the 
retrieved image is poor in this algorithm. Petitcolas et al., 
[3] have discussed a survey on information hiding. The 
applications and the terminologies used in information 
hiding are elucidated. Certain limitations of information 
hiding have been portrayed, the JPEG compression, low 
pass filtering, cropping, scaling, Additive Gaussian noise, 
rotation, cause distortion in the image. An important 
understanding of these limitations is helpful in 
steganalysis. 

Fridich et al., [4] have presented a steganalytic 
method to identify stego image generated by F5 
steganographic algorithm in JPEG images reliably. The 
method involves estimation of the cover image histogram 
from the stego image. The statistical variation of cover 
image is determined using the Least Square Fit, which 
compares estimated histogram of DCT coefficients of 
stego-image. This technique can be extended to other 
steganographic algorithms that manipulate quantized 
DCT coefficients. Siwei Lyu and Hany Farid [5] have 
described an approach of multi-scale Wavelet 
Decomposition to build Higher Order Statistical Model to 
detect hidden data in gray images. Support Vector 
Machine is used to detect the statistical variations in a 
test image. 

Farid [6] has described a steganalytic method to detect 
hidden messages using Wavelet Decomposition by 
building Higher Order Statistical Model of the image. 
The model extracts basic coefficient statistics and error 
statistics from an optimal magnitude predictor. FLD is 
used to distinguish the cover and stego image. Siwei Lyu 
and Hany Farid [7] have described a Universal 
Steganalysis Algorithm that exploits strong statistical 
regularities that exist between the color channels of 
natural images. The statistical model is generated using 
first and Higher Order Statistics of the image. One Class 
SVM is used to detect hidden data. 

Hany Farid and Siwei Lyu [8] have employed Multi- 
Scale Wavelet Decomposition to build a statistical model 
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III. Model 

The steganalysis model is discussed in this section. 
Steganalysis Model 

Figure 1 gives the block diagram of SHDFT to 
discriminate cover and stego images using SVM. 

Color image: JPEG and BMP color images are used 
for the analysis. It consists of RGB color planes, c G {r, 
g, b}. The color planes are separated and ID-Histogram 
is applied to each of the separated color planes of the 
image. 

Histogram: Histograms are the basis for numerous 
spatial domain processing techniques used to provide 
useful image statistics. Image processing operations such 
as steganography result in changes to the image 
histogram. The histogram is applied on each color plane 
of the image. The first and second PDF moments i.e., the 
mean and the variance of the histogram coefficients of 
each colour plane are calculated which yields 6D 
statistical features. 

Discrete Fourier Transform (DFT): The sequence of 

N spatial complex coefficients x _ Xi, , x N .j is 

transformed into the sequence of N frequency complex 

coefficients X(0), X(l), , X(N-l) by the DFT 

according to the formula as given in Equation 

W=i>)exp(^^) 

Where n=0, I... N-l and k=0, I... N-l. 

DFT for Histogram of each color channel, ce {r, g, b} 
is applied. The PDF moments i.e., mean, variance, 
skewness and kurtosis of the DFT coefficients are 
computed which yields 12D statistics. The total energy 
for DFT coefficients is computed which yields 3D 
Statistics. 

SVM: In One-Class SVM, only the features of cover 
images are required for classification of cover and stego 
image. The feature vector comprising 24D statistics is 
used to train the SVM. 
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Figure 1 : Block Diagram of SHDFT 

Features:We find the mean of the difference between 
DFT and Histogram for each color plane which yields 3D 
statistics. SHDFT totally yields 24D characteristics which 
are used to discriminate the cover and stego image. 

IV. Algorithm 

Table l : 



generate 100 stego images Using Jphide software for 
JPEG images and S-Tools 4.0 software for BMP images. 
The payload is of the size 6.0, 4.7, 1.2, 0.3 kilobytes, 
corresponding to 100%, 78%, 20% and 5% Of total 
cover capacity The capacity will vary according to the 
size of the cover image. The steganographic capacity is 
then the ratio of the size of the embedded payload to the 
total cover capacity. Each stego image is generated with 
the same quality factor as that of the original cover image 
so as to minimize the statistical variations. Figure 3 and 
Table 2 gives the comparison of the detection rate for the 
proposed algorithm SHDFT and existing algorithm SHS 
for different embedding rate in BMP images using S- 
Tools. It is observed that the percentage of detection rate 
increases as embedding rate increases for both the 
algorithms. Figure 4 and Table 3 gives the comparison of 
the detection rate for the proposed algorithm SHDFT and 
existing algorithm SHS for different embedding rate in 
JPEG images using JPhide and S-Tools. The proposed 
algorithm SHDFT requires only 24D statistics for 
training SVM as compared to 432D statistics required for 
existing SHS algorithm for the detection of payload[9]. 



ALGORITHM FOR SHDFT 



Algorithm for SHDFT 

• Input: The test image 

• Output: SVM classifies the 
cover or stego image. 

Separation of the color planes, c 

the color image. 

Build Histogram (ID) for each 



test image as 
G (r, g, bj of 



color planes 
ce {r, g, b}. Compute the l sl and 2 nd moments 
i.e., mean and variance of the histogram 
coefficients. This yields 6D statistics. 

3. Build Discrete Fourier Transform for histogram 
of each color planes, c G {r, g, b}. Compute the 
I st , 2" d , 3 rd and 4' h moments i.e., mean variance, 
skewness and kurtosis of the DFT coefficients. 
This yields 12D statistics. Also compute the 
total energy for the DFT coefficients. This 
yields 3D more statistics. 

4. The 1 st order moment Mean of the difference of 
histogram and DFT is computed for each color 
channel. This yields 3D statistics. 

5. The feature vectors obtained from steps 2, 3 and 
4 yields 24D features. 

6. The feature vectors obtained in steps 5 are used 
to train SVM to classify test image. 






(e) (f) 

Figure 2: The payload images (a)( b), BMP cover 
image (c)( d) and JPEG cover image (e)(f). 



V. PERFORMANCE ANALYSIS 

For the performance analysis we consider 100 color 
images of JPEG and BMP formats with 600*400 pixels 
in size (86 kilobytes). Figure 2 (a)(b) shows the payload. 
Figure2 (c)(d) and (e)( f) gives the cover images of BMP 
and JPEG formats respectively. The payload of various 
sizes is embedded into the full-resolution cover images to 
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Table 2 
EMBEDDING RATE FOR BMP IMAGES USING S-TOOLS 



Embedding 
Rate (%) 


Detection Rate (%) 


SHDFT 


SHS 


100 


96.42 


85.71 


78 


92.67 


78.57 


20 


89.87 


71.42 


5 


86.78 


64.28 



S-Tools 



□ SHDFT 
■ SHS 
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70 - 
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40 - 
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10 - 
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Figure 3: Comparison of Detection rate for various payload sizes for 
SHDFT and SHS algorithm for BMP images using OC-SVM. 



Table 3 
EMBEDDING RATE FOR JPEG IMAGES USING JPHIDE 



Embedding 
Rate (%) 


Detection Rate (%) 


SHDFT 


SHS 


100 


98.67 


89.91 


78 


95.73 


80.57 


20 


91.32 


76.42 


5 


86.78 


68.28 



JPhide 



1SHDFT 
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Figure 4: Comparison of Detection rate for various payload sizes for 
SHDFT and SHS algorithm for JPEG images using OC-SVM 



VI. Conclusion 

In the wake of a number of steganographic techniques 
being developed, steganalysis needs to have a universal 
approach. We present a steganalysis technique that relies 
on building a Statistical Model using the Histogram and 
Discrete Fourier Transform. SVM is used for 
classification of stego and cover images. This Model has 
fairly higher detection rates and comparatively 
computation time is less, as the numbers of features 
required to train the SVM is much lesser than other 
techniques. In future, the algorithm may be tested for 
embedding rate less than 5%. 
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